503 research outputs found

    Big data space fungus

    Get PDF

    Bulkloading and Maintaining XML Documents

    Get PDF
    The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary. Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated. We implemented our ideas on top of the Monet Database System and benchmarked their performance

    Memory aware query scheduling in a database cluster

    Get PDF
    Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup

    Omega Omega -storage : a self organizing multi-attribute storage technique for very large main memories

    Get PDF
    Main memory is continuously improving both in price and capacity. With this comes new storage problems as well as new directions of usage. Just before the millennium, several main memory database systems are becoming commercially available. The hot areas include boosting the performance of web-enabled systems, such as search-engines, and auctioning systems. We present a novel data storage structure -- the {em OmegaOmega-storage structure, a high performance data structure, allowing automatically indexed storage of {em very large amounts of multi-attribute data. The experiments show excellent performance for point retrieval, and highly efficient pruning for {em pattern searches. It provides the balanced storage previously achieved by random kd-trees, but avoids their increased pattern match search times, by an effective assignment bits of attributes. Moreover, it avoids the sensitivity of the kd-tree to insert orders

    Scalable storage for a DBMS using transparent distribution

    Get PDF
    Scalable Distributed Data Structures (SDDSs) provide a self-managing and self-organizing data storage of potentially unbounded size. This stands in contrast to common distribution schemas deployed in conventional distributed DBMS. SDDSs, however, have mostly been used in synthetic scenarios to investigate their properties. In this paper we concentrate on the integration of the LH* SDDS into our efficient and extensible DBMS, called Monet. We show that this merge permits processing very large sets of distributed data. In our implementation we extended the relational algebra interpreter in such a way that access to data, whether it is distributed or locally stored, is transparent to the user. The on-the-fly optimization of operations --- heavily used in Monet --- to deploy different strategies and scenarios inside the primary operators associated with an SDDS adds self-adaptiveness to the query system; it dynamically adopts itself to unforeseen situations. We illustrate the performance efficiency by experiments on a network of workstations. The transparent integration of SDDSs opens new perspectives for very large self-managing database systems

    Region-based indexing in an image database

    Get PDF
    Image retrieval systems based on the image-query-by-example paradigm locate their answer set using a similarity measure of the query image with all images stored in the database. Although this approach generally works for quick re-location of `identical' or partly occluded images, it does not support the more interesting query type aimed at finding images with a particular image fragment. In this paper we introduce a regionbased indexing scheme to support retrieval of images on the basis of both global and local image features

    Stethoscope: A platform for interactive visual analysis of query execution plans

    Get PDF
    Searching for the performance bottleneck in an execution trace is an error prone and time consuming activity. Existing tools oer some comfort by providing a visual representation of trace for analysis. In this paper we present the Stethoscope, an interactive visual tool to inspect and analyze columnar database query performance, both online and online. It's unique interactive animated interface capitalizes the large dataflow graph representation of a query execution plan, augmented with query execution trace information. We demonstrate features of Stethoscope for both online and online analysis of long running queries. It helps in understanding where time goes, how optimizers perform, and how parallel processing on multi-core systems is exploited

    Meet Charles, big data query advisor

    Get PDF
    In scientific data management and business analytics, the most informative queries are a holy grail. Data collection becomes increasingly simpler, yet data exploration gets significantly harder. Exploratory querying is likely to return an empty or an overwhelming result set. On the other hand, data mining algorithms require extensive preparation, ample time and do not scale well. In this paper, we address this challenge at its core, i.e., how to query the query space associated with a given database. The space considered is formed by conjunctive predicates. To express them, we introduce the Segmentation Description Language (SDL). The user provides a query. Charles, our query advisory system, breaks its extent into meaningful segments and returns the subsequent SDL descriptions. This provides insight into the set described and offers the user directions for further exploration. We introduce a novel algorithm to generate SDL answers. We evaluate them using four orthogonal criteria: homogeneity, simplicity, breadth, and entropy. A prototype implementation has been constructed and the landscape of follow-up research is sketched

    Cracking the database store

    Get PDF
    Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures d
    • …
    corecore